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Abstract. This paper describes a tool for intersecting context-free gram- 
mars. Since this problem is undecidable the tool follows a refinement- 
based approach and implements a novel refinement which is complete 
for regularly separable grammars. We show its effectiveness for safety 
verification of recursive multi-threaded programs. 


1 Introduction 

Checking emptiness of intersections of context-free grammars is a well-known 
undecidable problem. However, the fact that this problem is equivalent to safety 
verification of recursive multi-threaded programs has kept motivating the design 
of semi-decision procedures that can still be effective in practice. 

In this paper, we describe COVENANT, a tool for checking whether the lan- 
guages of an arbitrary number of context-free grammars are disjoint and show 
its role as a component in the analysis of recursive multi-threaded programs. 
The tool takes a grammatical approach [7], in the sense that it is formalized 
in terms of context-free grammars rather than pushdown automata [1, 2, 10]. It 
implements a counter-example guided abstraction refinement (CEGAR) of reg- 
ular over-approximations and integrates a complete refinement procedure that 
guarantees termination if the context-free grammars are regularly separable . 3 We 
show its application to safety verification of recursive multi-threaded programs. 

To the best of our knowledge, our tool is the only publicly available implemen- 
tation tackling the problem of intersecting unbounded context-free grammars. 

2 Approach 

The tool discussed in this paper follows the so-called counter-example guided 
abstraction refinement (CEGAR) of regular over-approximations. Without loss 
of generality, in this presentation we consider the intersection of just two context- 
free grammars G\ and Gi . The scheme is based on an initial abstraction which is 
repeatedly refined until either the languages are proven disjoint, an intersection 
witness has been found, or resources have been exhausted: 

3 Two context-free grammars Gi and Gi are regularly separable if there exist two 
regular languages Li and L 2 such that C(G\) C Li, C{G 2 ) C L 2 and L\ fl L 2 = 0. 



1. Abstraction: compute regular approximations R± and R 2 such that £(Gi) C 
£{Rx) and £(G 2 ) C £(R 2 ). 

2. Verification: using a decision procedure for regular languages if £(i?i) fl 
£{R 2 ) = 0 then £(Gi) fl £(G 2 ) = 0, so answer “the languages are disjoint.” 

If w € (£(Ri) fl £(R 2 )), w € £(G 1), and w £ £(G 2 ) then £(Gi) fl £(G 2 ) 7^ 

0, so answer “the languages are not disjoint” and provide w as a witness. 
Otherwise, go to step 3. 

3. Refinement: produce new regular approximations R[ and R' 2 such that for 
each Rt, i £ {1, 2}, we have £{Gi) C £{R'f) C £{Rf), and £(R' i ) C £{Ri) for 
some i. Update the approximations Ri <— R[,R 2 <— R 2 , and go to step 2. 

Abstraction. Note that a regular approximation always exists for any grammar 
G since we can use £* , where £ is the alphabet of G. However, the precision of 
the initial abstraction often has a significant impact on the convergence of the 
refinement loop, so non-trivial initial abstractions such as the i th - prefix abstrac- 
tion [1,2] and the downward closure with a cycle-breaking heuristic [7] are more 
suitable candidates. 

Verification. This step assumes a decision procedure that returns “no” if £{Afi)C\ 
£{A 2 ) = 0 or returns a witness toifwg £{A\) fl £{A 2 ) 7^ 0, where A\ and A 2 
are finite-state automata recognizing regular languages i?i and R 2 , respectively 
(that is, £{A\) = R\ and £{A 2 ) = R 2 ). This can be solved using, for instance, 
the classical product construction. Note that a different approach would make 
use of the fact that the class of context-free languages is closed under intersec- 
tion with regular languages. However, one advantage of our approach is that we 
are able to leverage the latest advances made in string solving [4,6, 11]. 
Refinement. At this point, the regular solver has found some witness w such 
that w € {£{A\) fl £(A 2 )), but w £ (£(Gi) fl £{G 2 )). There are three cases: (f ) 
w £ £(Gi) Aw £ £(G 2 ), (2) w ^ £(Gi) A w £ £(G 2 ), and (3) w € £(G 1) A w (fc 
£{G 2 ). For (f ) and (3) we should refine A\ and A 2 , respectively. For (2) we could 
choose to refine either A\ or A 2 , or both, covenant aggressively refines both. 

We say a language £ is a safe generalization of a witness w with respect 
to a context-free grammar G if (a) L D {w} and (b) L fl £(G) = 0. If w £ 
£(Gi) then a straightforward refinement is to produce a new abstraction that 
recognizes £(A,)\{rc} in place of A, . However, that refinement process will rarely 
converge, as it excludes only finitely many examples. Instead, we would like to 
produce safe generalizations of w containing an infinite number of words, to 
hasten convergence. For this purpose, our tool implements the concept of star- 
generalization [5]. Informally, a star- generalization of a word w is a language 
that applies the Kleene * operator (i.e., unbounded repetition) to any number 
of non-overlapping, but possibly nested, subsequences of w, while ensuring the 
resulting augmented language remains disjoint with the language of G. 

3 Covenant 

The tool is publicly available at https : / /bitbucket . org/jorgenavas/covenant. 
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3.1 Design and Implementation Choices 

The tool is implemented in CH — b and parameterized by the initial approximation, 
the regular solver, and the refinement procedure. 

Abstraction. One advantage of having a grammatical view is that COVENANT 
can easily leverage the advances made in areas such as speech processing where 
precise abstraction of context-free grammars into regular grammars is an active 
topic of research. COVENANT implements the method described by Nederhof [8] 
for approximating context-free grammars with strongly regular languages. 

We say a grammar is strongly regular if all productions are of the form: 
A — > B w | w or A — > w B | w, where w E S* and A, B are nonterminals. The 
abstraction relies on the following observation: a grammar with productions of 
the form A -A aA /3 with both a, /3 non-empty might not be represented as a 
strongly regular grammar because a and /3 might be related through an “un- 
bounded” communication not expressible by regular languages. The abstraction 
consists of breaking conservatively those unbounded communications in such a 
way that the grammar becomes regular while preserving as much as possible 
the structure of the original grammar. Nederhof [8] also proposed a transfor- 
mation from strongly regular grammars to finite automata also implemented in 
COVENANT. 

Regular Solver. COVENANT currently implements only the naive product con- 
struction for intersecting finite automata but other regular solvers can easily 
be plugged in. In fact, an initial implementation of COVENANT was tested using 
Revenant [4], an efficient regular solver based on bounded model checking with 
interpolation, though the released version does not incorporate it. 

Refinement. COVENANT implements both the greedy and maximum star-epsilon 
generalizations both described in [ 5 ]. We refer to [ 5 ] for details and describe 
them informally through an example. Assume we have a witness w = aab and 
a context-free language L = { a''b' +1 | i > 0}. The greedy algorithm starts by 
checking whether ITi = a*ab ^ L. Since the query succeeds, W\ is already a safe 
generalization of w wrt. L so we could stop here. However, we can continue and 
check next whether W2 = a*a*b ^ L but b £ L so W2 must be discarded. Next, 
we try whether W3 = a*ab* ^ L but abb € L and thus I-V3 is also not a safe 
generalization. Finally, we query W4 = a*(ab)* ^ L and W5 = (a*(ab)*)* ^ L 
which both succeed. Therefore, starting from aab, our tool can produce the safe 
generalization (a*(ab)*)*. In fact, our tool generated three safe generalizations: 
W\ C W4 C IT5. Although this greedy method is reasonably cheap ( 0 (|u>| 2 )), if 
resources are scarce it can stop at any time, returning either W\ or IT 4 . 

The maximum star-epsilon version is similar to the greedy one except it 
will compute the union of all possible safe generalizations without committing 
to any successful partial generalization. That is, the greedy version started by 
checking ITi and since ITi succeeded the rest of queries were relative to W\. 
The non-greedy version will not commit to ITi but also try other possibilities. 
For instance, aa*b, (aa)*b, (aab)*, and a(ab)* are also safe generalizations from 
which we can keep generalizing. Although more expensive, it is worth mentioning 
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that this version ensures termination of the CEGAR loop whenever the context- 
free grammars are regularly separable. 

For the implementation, two operations are important: (a) intersection be- 
tween a context-free grammar and a finite automaton for checking safe gener- 
alizations, and (b) automata difference for refining the current abstraction by 
discarding a safe generalization W of w. For (a), covenant uses a modified 
version of the efficient pre* algorithm [3] and for (b) it intersects the current 
abstraction with the complement of the determinization of W. Although deter- 
minization of automata can have an exponential size blowup, this behavior is 
rare; we have not seen it during experiments. Based on our experience, it is also 
useful to minimize after (b) has been performed, to keep the abstraction small. 

3.2 Preprocessing and Output 

We require that the input grammars are in the following normal form: A —> BC \ 
B | a | e, where A , B, C are nonterminals and a £ E + , where E is the alphabet 
of the grammar. Any context-free grammar can be converted to this form by a 
linear increase in terms of the size of the original grammar, covenant performs 
this normalization as a preprocessing step but it does not require any further 
(more expensive) normalizations, such as Chomsky Normal Form. 

If COVENANT proves that the language of the grammars are not disjoint it will 
return a witness. The user can set the option — solutions n to ask the solver 
for n solutions. The option — dot will output the automata resulting from the 
initial abstraction, each of the safe generalizations, and the final abstractions 
when emptiness was proven, all in the dot language of the Graphviz package. 


4 Safety Verification of Multithread Programs 

Bouajjani et al. [1] pioneered safety verification of recursive multi-threaded pro- 
grams by reduction to checking the intersection of context free languages for 
emptiness. For lack of space, we refer to [1, 2, 7] for details of the encoding. 

We have tested COVENANT and compared with lcegar [7] using two classes 
of programs: textbook Erlang programs and several variants of a real Bluetooth 
driver. A detailed description of the programs as well as the safety properties can 
be found in [2,7,9]. We ran lcegar with the setting provided by the authors 
and tried with the two available initial abstractions: pseudo- downward closure 
(PDC) and cycle breaking (CB). Table 1 shows the results. The symbol oo means 
a timeout expired after 2 hours. We ran COVENANT with the greedy refinement. 


5 Related Work 

To the best of our knowledge, COVENANT is the first publicly available implemen- 
tation for intersecting context-free grammars ensuring termination for regularly 
separable grammars. Several CEGAR approaches have been proposed before. 
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Program 

COVENANT 

LCEGAR 



PDC 

CB 

SharedMem 

safe 

0.01 

14.37 

24.75 

Mutex 

safe 

0.04 

6.12 

0.14 

RA 

safe 

0.01 

OO 

0.39 

Modified RA 

safe 

0.03 

(X) 

27.90 

TNA 

unsafe 

0.01 

0.02 

0.25 

Banking 

unsafe 

0.01 

OO 

3.36 


(a) Verification of multi-thread Erlang programs 


Program 

COVENANT 

LCEGAR 



PDC 

CB 

Version 1 

unsafe 

0.84 

19.74 

21.04 

Version 2 

unsafe 

0.25 

5560.00 

4852.00 

Version 2 w/ Heuri 

unsafe 

0.11 

44.68 

38.89 

Version 3 (1A2S) 

unsafe 

0.12 

217.74 

217.27 

Version 3 (1A2S) w/ Heuri 

unsafe 

0.05 

6.68 

11.37 

Version 3 (2A1S) 

safe 

0.27 

4185.00 

3981.00 


(b) Verification of multi-thread Bluetooth drivers 


Table 1: Comparison of COVENANT with LCEGAR; times in seconds. All experi- 
ments ran on a single core of a 2.4GHz Core i5-M520 with 8GiB of memory. 


Here, we do not consider the effect of initial approximations, as they do not 
affect the expressiveness of the refinement loop and are easily interchangeable. 

The first CEGAR approach was proposed in [1] based on the concept of 
refinable finite-chain abstraction which consists of computing the series (aj)j>i 
overapproximating the language of a CFG G such that £(ai(G)) D £(« 2 (G)) D 
••• D £(G). Several refinable abstractions were described in [1] although no 
experimental data was provided. Instead, we compare here with the i th -prefix 
abstraction 4 implemented in [2]. In this, ct;(G) is the set of words of G of length 
less than i, together with the set of prefixes of length i of G. We argue that 
the refinement implemented in COVENANT is more expressive as it is not hard 
to find regularly separable languages that cannot be proven so by the i th - prefix 
abstraction. For instance, with Ri = a*b and R 2 = a*c, we have Ri H R 2 = 0, 
while for every length i, the string a* forms a prefix to words in both Ri and 
f ?2 • Therefore the intersection of the two abstractions will always be non-empty. 

The LCEGAR method described in [7] is based on a similar refinement frame- 
work, but the approach differs radically. LCEGAR maintains a pair of context-free 
grammars A\, A 2 , over-approximating the intersection of the original languages. 
At each refinement step, an elementary bounded language Bj is generated from 
each grammar A^. 4 5 The refinement ensures B, Cl A, ^ 0, but Bi is not necessarily 

4 [2] also implemented the i th -suffix abstraction which suffers from same limitations. 

5 An elementary bounded language is a language of the form B = wl . . -wf, where 
each Wi is a (finite) word in E* . 
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either an over- or under-approximation of A*. After that, I = B., (1 ii fl L 2 is 
computed. If I is non-empty, L\ C\L 2 must also be non-empty. If / is empty, then 
the approximations can safely be refined by removing Bi. 

Here a comparison between methods is more involved, and we refer to [5] for 
details. Suffice it to say that the refinements done by LCEGAR and COVENANT 
are incomparable. That is, there are grammars which are not regularly separable 
for which LCEGAR can terminate but covenant cannot, and there are also 
grammars which are regularly separable but LCEGAR cannot terminate. 

Finally, the verification phase in covenant consists of intersecting finite 
automata, for which efficient solvers are available. Instead, LCEGAR intersects 
several context-free grammars and a bounded language which, although decid- 
able, is NP-Complete. In our experience, LCEGAR makes a smaller number of 
refinements than COVENANT, but each refinement in COVENANT is considerably 
cheaper than in LCEGAR, resulting in the better performance. 

6 Conclusions 

The main contributions of this paper have been to describe a tool for intersecting 
context-free grammars, and to show it can be effective for safety verification of 
recursive multi-threaded programs. 
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Appendix: Tool Demonstration 


Installation. The installation process relies on cmake which allows COVENANT 
being installed across different platforms 6 . The only requirements are: 

— to install the boost library 

— to install a reasonable modern C++ compiler version. COVENANT has been 
tested for clang++ 3.2 and g++ 4.8. 


Usage. The options for running covenant are: 

covenant [OPTIONS] INPUT 

where INPUT is a text file containing the context-free grammars to be inter- 
sected. We will describe the most relevant options as they are needed during 
this presentation. All options are visible by typing covenant -h. 

“UN SAT” example. Consider the two contrived regularly-separable context-free 
languages Li = {wcw R \ w £ {a, b}*} and L 2 = {a"cb n }. The content of INPUT 
is the following: 

; ; LI = { wcw~{R} I w \in {a,b}''{*} } 

(SI -> [ "c" ] ; 

51 -> [ "a" SI "a", "b" SI "b" ]; 

) 

;; L2 = { (a~n)c(b~n) I n>0 } 

( 

52 -> [ "a" A2 "b" ] ; 

A2 -> [ "c" ] ; 

A2 -> [ S2 ] 

) 

The nonterminal symbol appearing on the left-hand side in the first grammar 
production is considered the start symbol of the grammar. In the above example 
the start symbols are SI and S2, respectively. Terminal symbols must be between 
double quotes (e.g., “a” and “b”). Each grammar production is of the form A -> 
[ . . . ] ; where A is a nonterminal and ... is a sequence of any nonterminal or 
terminal symbol separated by one or more blanks. We also allow A -> [ . . . , 

. . . ] ; to express two grammar productions with the same left-hand side A. That 
is, SI— > [ "a" SI "a", "b" SI "b" ]; is syntactic sugar for SI -> [ "a" SI 
"a"] ; SI ->[ "b" SI "b" ] ;. Any text after ; ; is considered a comment. 

If we type covenant INPUT — gen=max-gen, where max-gen refers to the 
maximum star-epsilon generalization, we obtain: 

6 COVENANT has been tested only for x86_64. 
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Fig. 1: abstractions . dot: initial regular abstractions 


Finished after 2 cegar iterations. 


UNSAT 


This means that COVENANT proved that the two languages L i and L 2 are dis- 
joint. Similar result and number of iterations are obtained if we run with option 
— gen=greedy (for the greedy generalization). 

We can also view the dot files produced by COVENANT by adding option 
— dot. Figure 1 depicts the initial regular approximations of the context-free 
grammars by means of finite-state automata. Note that in both cases the struc- 
ture of the original context-free grammar is preserved as much as possible. For 
instance, for L 2 , the abstraction comes from the need to “forget” the relation 
between the number of as and bs while still remembering that a symbolic c 
must separate the as and bs. Each spurious witness found during the refinement 
loop is also shown as a finite automata as well as their safe generalizations in 
Figure 2. Note that after the second witness w = acb is found, COVENANT only 
needs to compute a safe generalization of w with respect to L\, since w is already 
recognized by L 2 . Figure 3 shows the final regular approximations which prove 
that L\ and L 2 are disjoint. 

"SAT” example. Consider now the languages L\ = {u; | #a's = #b's,w £ 
(a, b}*} and L 2 = {ww R \ w £ {a, b}*} with the corresponding context-free 
grammars in COVENANT format: 

; ; LI = { w I w \in {a,b}~{*}, #a’s=#b’s } 

( SI -> [] ; 

SI -> [ "a" SI "b" SI] ; 

SI -> [ "b" SI "a" SI] 

) 

; ; L2 = { ww~{R} I w \in {a,b}~{*} } 

( S2 -> ["a" "a", "b" "b" , "a" S2 "a", " 
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b" S2 "b"]) 





Fig. 2: ref inements . dot: witnesses and safe generalizations 


By typing covenant INPUT, we obtain: 

Found a solution after 5 iterations: 
abba 

SAT 


That is, COVENANT found a word abba that is recognized by both languages. 
Moreover, we can ask COVENANT for more solutions, say, five more solutions, by 
typing covenant INPUT — solutions 5: 

Found a solution after 5 iterations: 
abba 

Found a solution after 6 iterations: 
b a a b 

Found a solution after 20 iterations: 
aabbbbaa 
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Fig. 3: proofs. dot: final regular abstractions that prove emptiness 


Found a 
a b a b 
Found a 
abba 


solution after 22 iterations: 
baba 

solution after 23 iterations: 
abba 


SAT 
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